Picture for Zhiyong Wu

Zhiyong Wu

LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation

Add code
May 27, 2026
Viaarxiv icon

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment

Add code
May 22, 2026
Viaarxiv icon

OpenCompass: A Universal Evaluation Platform for Large Language Models

Add code
May 19, 2026
Viaarxiv icon

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Add code
May 11, 2026
Viaarxiv icon

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis

Add code
Apr 24, 2026
Viaarxiv icon

Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model

Add code
Apr 21, 2026
Viaarxiv icon

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

Add code
Mar 24, 2026
Viaarxiv icon

PROMO: Promptable Outfitting for Efficient High-Fidelity Virtual Try-On

Add code
Mar 12, 2026
Viaarxiv icon

Kling-MotionControl Technical Report

Add code
Mar 03, 2026
Viaarxiv icon

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction

Add code
Jan 06, 2026
Viaarxiv icon